Econometrics and Statistics (520K)


Econometric Methods in
Business and Economics


Practical Class (Winter 24/25)

by Domenic Franjic


Exercise Sheet No 04


Linear Regression with One Regressor (Part 3)

Set-Up

1) Create your (working) directory

Create a new folder on your computer in which you can save all necessary files, i.e., the data set, the R script with the implementation of your analysis and the R helper function script with some helper functions provided by us.

This folder, i.e., the path or directory of this folder, will be your so-called working directory.


2) Download the data set

Download the data sets CASchools_01_data.txt and CPS_01_data.txt from ILIAS directly or by using the link below:

Download CASchools_01_data.txt

Download CPS_01_data.txt

Save the data set into a (working) directory of your choice.


3) Prepare script

Open a new R script and save it into into your (working) directory.

Run the following code chunk to prepare your R script, i.e., to clear the work space and set your working directory.

# Clear workspace
rm(list = ls(all = TRUE))
# Set working directory
setwd(".../set/your/working/directory/...")

X) Install libraries

Run the following code chunk to make sure that all necessary libraries are installed.

Note, You have to run this code chunk only once. After running this code chunk you have installed all necessary libraries for the class and we will load them whenever we need them.


4) Install/load packages and download/save/include helper function script

Step 1) Install all relevant R packages and download and save the R script r_helper_functions.R from ILIAS by running the following code chunk.

# Download R-helper functions
source("https://ilias.uni-hohenheim.de/data/UHOH/lm_data/lm_1856939/MA_EconometricMethods_WiSe2324_PracticalClass/r-scripts/prepare_r_packages_and_helper_functions.R")

Note, the required R packages are "moments", "sandwich", "lmtest", "lmtest", "car", "plm", "ivreg", "dynlm", "forecast", "urca".

Step 2) Include the R script r_helper_functions.R into your R script by running the following code chunk.

# Include R-helper functions
source("r_helper_functions.R")

Note, more information on the helper functions can be found here.

Task 1


Student teacher ratio and student performance

We continue our analysis of the relationship between student performance and class size.

For this we use again the data set CASchool_01_data.txt and the two variables \(test score\) and \(STR\) as obtained in the first exercise sheet.


Variables

Variable Description
\(students\) Total enrollment.
\(teachers\) Number of teachers.
\(read\) Average reading score.
\(math\) Average math score.

Preparations

Load data, construct variables and estimation results.


Load data set

# use absolute or relative path to load data
cas.dat <- read.table(file = "CASchools_01_data.txt",
                      header = TRUE,
                      sep = ",",
                      colClasses = c("numeric","numeric","numeric","numeric"))

Construct variables

# compute STR and append it to cas.dat
cas.dat$STR <- cas.dat$students/cas.dat$teachers 
# compute TestScore and append it to cas.dat
cas.dat$score <- (cas.dat$read + cas.dat$math)/2 

Estimation results No 01

# estimate linear model
lm.res.01 <- lm(score ~ STR, data = cas.dat)
# show results
lm.res.01

The estimation results below include robust standard errors which are not part of the results from the lm() function above.

\(\Rightarrow\) See: implementation below!

## 
## =============================
##            Model 1           
## -----------------------------
## intercept  698.93 (10.34) ***
## STR         -2.28  (0.52) ***
## -----------------------------
## R^2          0.05            
## SER         18.58            
## =============================
## *** p < 0.001; ** p < 0.01; * p < 0.05

Task a)

Replicate the results in Table 1 of the exercise sheet.


Estimate linear regression model

# Estimate linear regression model
lm.res.01 <- lm(score ~ STR, data = cas.dat)
summary(lm.res.01)
## 
## Call:
## lm(formula = score ~ STR, data = cas.dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -47.727 -14.251   0.483  12.822  48.540 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 698.9329     9.4675  73.825  < 2e-16 ***
## STR          -2.2798     0.4798  -4.751 2.78e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.58 on 418 degrees of freedom
## Multiple R-squared:  0.05124,    Adjusted R-squared:  0.04897 
## F-statistic: 22.58 on 1 and 418 DF,  p-value: 2.783e-06

Scatter Plot with Fitted Regression Line

# plot parameters
par(mfrow=c(1, 1),
    mar=c(5, 5, 3, 2))
# scatter plot
plot(cas.dat$STR, cas.dat$score,
     main = "Scatter Plot of Test Score and STR", 
     xlab = "STR (X)",
     ylab = "Test Score (Y)")
abline(lm.res.01, col = "red", lty = 2)


Standard error robust w.r.t. heteroskedasticity: Use wrapper function lm_ct_fun() from above

# call wrapper function for robust inference
lin.mod.rob <- lm_ct_fun(score ~ STR, data = cas.dat, hc.type = "HC1")
# print results
lin.mod.rob$ct
## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 698.93295   10.36436 67.4362 < 2.2e-16 ***
## STR          -2.27981    0.51949 -4.3886 1.447e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sampling distribution of \(\widehat{\beta}_1\)

\[\begin{align} \widehat{\beta}_1 \sim N\left(\beta_1, \sigma^2_{\widehat{\beta}_1} \right) \;\;\; \text{(approximately)} \end{align}\]

with (heteroskedasticity robust)

\[\begin{align} \sigma^2_{\widehat{\beta}_1}=\frac{1}{n}\frac{\mathrm{var}\left[\left(X_i-\mu_X\right)u_i\right]}{\left[\mathrm{var}\left(X_i\right)\right]^2} \end{align}\]

under homoskedasticity

\[\begin{align} \tilde{\sigma}^2_{\widehat{\beta}_1}=\frac{1}{n}\frac{\mathrm{var}\left[u_i\right]}{\left[\mathrm{var}\left(X_i\right)\right]^2} \end{align}\]

Important

  • Using heteroskedasticity robust SE is more complicated but more robust.
  • Heteroskedasticity robust SE are readily available in almost any statistics software.
  • \(\Rightarrow\) In case of doubt use heteroskedasticity robust standard error!

Add your interpretation here!

Task b)

Test whether \(STR\) has a significant effect on \(score\).


Conduct inference


Based on homoskedasticity only

# call wrapper function lm_ct_fun (here, using homoskedastic ses)
lm.ct.ord.01 <- lm_ct_fun(score ~ STR, data = cas.dat, hc.type = "const")
ct.ord.01 <- lm.ct.ord.01$ct
ct.ord.01
## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 698.93295    9.46749 73.8245 < 2.2e-16 ***
## STR          -2.27981    0.47983 -4.7513 2.783e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Robust w.r.t. heteroskedasticity

# call wrapper function lm_ct_fun (here, using robust ses)
lm.ct.rob.01 <- lm_ct_fun(score ~ STR, data = cas.dat, hc.type = "HC1")
ct.rob.01 <- lm.ct.rob.01$ct
ct.rob.01
## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 698.93295   10.36436 67.4362 < 2.2e-16 ***
## STR          -2.27981    0.51949 -4.3886 1.447e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Confidence interval based on homoskedasticity only standard errors

# 95% ci bounds based on homoskedasticity
ci.95.up.ord.01 <- ct.ord.01[,1] + 1.96 * ct.ord.01[,2] # upper bound
ci.95.lo.ord.01 <- ct.ord.01[,1] - 1.96 * ct.ord.01[,2] # lower bound
# results
cbind(ci.95.lo.ord.01, ci.95.up.ord.01)
##             ci.95.lo.ord.01 ci.95.up.ord.01
## (Intercept)      680.376667       717.48923
## STR               -3.220266        -1.33935

Confidence interval based on heteroskedasticity robust standard errors

# 95% ci bounds based on heteroskedasticity
ci.95.up.rob.01 <- ct.rob.01[,1] + 1.96 * ct.rob.01[,2] # upper bound
ci.95.lo.rob.01 <- ct.rob.01[,1] - 1.96 * ct.rob.01[,2] # lower bound
# results
cbind(ci.95.lo.rob.01, ci.95.up.rob.01)
##             ci.95.lo.rob.01 ci.95.up.rob.01
## (Intercept)      678.618800      719.247098
## STR               -3.298007       -1.261609

Interpretation

General (for slope parameter)

\[\begin{align} H_0&: \quad \beta_{1} = \beta_{1,0} \quad \text{vs.} \\ H_1&: \quad \beta_{1} \neq \beta_{1,0} \quad \alpha = 5\% \end{align}\]


Here

\[\begin{align} H_0&: \quad \beta_{STR} = 0 \quad \text{vs.} \\ H_1&: \quad \beta_{STR} \neq 0 \quad \alpha = 5\% \end{align}\]

Add your results here!

\[\begin{align} t^{act} &= \frac{\widehat{\beta}_{STR}^{act} - 0}{SE\left(\widehat{\beta}_{STR}\right)} \\ t^{act} &= \frac{-2.28 - 0}{0.519} \\ t^{act} &= -4.393 \end{align}\]

Add your results here!

  • \(|t^{act}| > 1.96\)
  • \(H_0\) can be rejected at 5% significance level.

Add your results here!

\[\begin{align} p-\text{value} &= 2*\Phi\left(-|t^{act}|\right) \\ &= 2*\Phi\left(-4.389\right) \\ &= 1.45\times 10^{-5} \end{align}\]

Add your results here!

Conclusion

  • \(p\text{-value}<0.05\)
  • \(H_0\) can be rejected at 5% significance level.

Definition \(p\)-value

  • The p-value is the probability of observing a value of \(\widehat{\beta}\) at least as different from \(\beta_{1,0}\) as the estimate actually computed \(\left(\widehat{\beta}^{act}_1\right)\), assuming that the null hypothesis is correct.
  • Type I error, i.e. rejecting \(H_0\) even if it is true.

Add your results here!

ill_hyp_tes_fun(t.act = -4.39, test = "two.sid")

  • The \(95\%\) confidence interval of \(\widehat{\beta}_{STR}\) does not contain zero.
  • Thus, the estimate \(\widehat{\beta}_{STR}\) is significant at a \(5\%\) significance level

Add your interpretation here!


Illustration

Illustration of hypothesis test


CDF Standard Normal Distribution

Source: Stock and Watson (2020), p. 763-764


Task c)

Related to the `superintendent’s question’ from the lecture construct the \(95\%\) confidence interval for the effect of reducing the student-teacher ratio by \(2\).


Confidence interval for \(\underline{\Delta STR = -2}\)

# 95% ci bounds for delta x = 2
ci.95.d2.up <- (ct.rob.01[2,1] + 1.96 * ct.rob.01[2,2]) * (-2) # upper bound
ci.95.d2.lo <- (ct.rob.01[2,1] - 1.96 * ct.rob.01[2,2]) * (-2) # lower bound
# results
cbind(ci.95.d2.lo, ci.95.d2.up)
##      ci.95.d2.lo ci.95.d2.up
## [1,]    6.596014    2.523218

Interpretation

Confidence interval for \(\underline{\beta_1}\)

\[\begin{align} \left[\widehat{\beta}_1 \pm 1.96 * \text{SE}\left(\widehat{\beta}_1\right) \right] \end{align}\]


Confidence interval for \(\underline{\beta_1 \Delta x}\)

  • Based on the CI for \(\beta_1\) above
  • Predicted effect of \(\Delta x\) based on the lower CI bound of \(\beta_1\) is \(\widehat{\beta}_1 - 1.96 * \text{SE}\left(\widehat{\beta}_1\right)\)
  • Predicted effect of \(\Delta x\) based on the upper CI bound of \(\beta_1\) is \(\widehat{\beta}_1 + 1.96 * \text{SE}\left(\widehat{\beta}_1\right)\)

Thus,

\[\begin{align} \left[\left(\widehat{\beta}_1 \pm 1.96 * \text{SE}\left(\widehat{\beta}_1\right) \right)*\Delta x\right] \end{align}\]


Here

Use robust standard errors

\[\begin{align} CI &= \left[\left(-2.28\pm 1.96\times 0.519\right) \times \left( -2 \right) \right] \\ &= \left[6.596,2.523\right] \end{align}\]

The \(95\%\) confidence interval for the slope coefficient does not include the zero.

Add your interpretation here!

Task d)

Recoding \(STR\) into a binary variable for small and large class sizes and replicate the analysis from Chapter 2.8 of the lecture.


Binary regression model

Binary regressors

  • \(X_i=1\) if small class size
  • \(X_i=0\) if not small class size, i.e. large class size

Regression model

\[\begin{align} Y_i=\beta_0 + \beta_1 X_i + u_i \end{align}\]

Coefficients

\[\begin{align} \mathrm{E}\left(Y_i|X_i=0\right)&=\beta_0 \\ \mathrm{E}\left(Y_i|X_i=1\right)&=\beta_0+\beta_1 \end{align}\]

\[\begin{align} \beta_1&=\mathrm{E}\left(Y_i|X_i=1\right)-\mathrm{E}\left(Y_i|X_i=0\right) \\ &= \text{population difference in group means} \end{align}\]

\(\Rightarrow\) Test if difference in means, i.e. \(\beta_1\), is different from zero!

Binary regressors

  • \(X_i=1\) if small class size
  • \(X_i=0\) if not small class size, i.e. large class size

Regression model

\[\begin{align} Y_i=\beta_0 + \beta_1 X_i + u \end{align}\]

Coefficients

\[\begin{align} \mathrm{E}\left(Y_i|X_i=0\right)&=\beta_0 \\ \mathrm{E}\left(Y_i|X_i=1\right)&=\beta_0+\beta_1 \end{align}\]

\[\begin{align} \beta_1&=\mathrm{E}\left(Y_i|X_i=1\right)-\mathrm{E}\left(Y_i|X_i=0\right) \\ &= \text{population difference in group means} \end{align}\]

\(\Rightarrow\) Test if difference in means, i.e. \(\beta_1\), is different from zero!


Group means

# recoding of STR into binary variable
cas.dat$small <- round(cas.dat$STR, 2) < 20
# conditional mean
mea.sma <- mean(cas.dat[round(cas.dat$STR, 2) < 20, "score"])
mea.sma
mea.lar <- mean(cas.dat[round(cas.dat$STR, 2) >= 20, "score"])
mea.lar
# conditional sd
sd.sma <- sd(cas.dat[round(cas.dat$STR, 2) < 20, "score"])
sd.sma
sd.lar <- sd(cas.dat[round(cas.dat$STR, 2) >= 20, "score"])
sd.lar
sd.all <- sd(cas.dat[, "score"])
# number of observations
nn.sma <- length(cas.dat[round(cas.dat$STR, 2) < 20, "score"])
nn.sma
nn.lar <- length(cas.dat[round(cas.dat$STR, 2) >= 20, "score"])
nn.lar
nn.all <- length(cas.dat[, "score"])
nn.all
# recoding of STR into binary variable
cas.dat$small <- round(cas.dat$STR, 2) < 20
# conditional mean
mea.sma <- mean(cas.dat[round(cas.dat$STR, 2) < 20, "score"])
mea.sma
## [1] 657.3513
mea.lar <- mean(cas.dat[round(cas.dat$STR, 2) >= 20, "score"])
mea.lar
## [1] 649.9788
# conditional sd
sd.sma <- sd(cas.dat[round(cas.dat$STR, 2) < 20, "score"])
sd.sma
## [1] 19.35801
sd.lar <- sd(cas.dat[round(cas.dat$STR, 2) >= 20, "score"])
sd.lar
## [1] 17.85336
sd.all <- sd(cas.dat[, "score"])
# number of observations
nn.sma <- length(cas.dat[round(cas.dat$STR, 2) < 20, "score"])
nn.sma
## [1] 238
nn.lar <- length(cas.dat[round(cas.dat$STR, 2) >= 20, "score"])
nn.lar
## [1] 182
nn.all <- length(cas.dat[, "score"])
nn.all
## [1] 420

Regression results

# recoding of STR into binary variable
cas.dat$small <- round(cas.dat$STR,2) < 20
# call wrapper function lm_ct_fun (here, using robust ses)
lm.ct.rob.01.dd <- lm_ct_fun(score ~ small, data = cas.dat, hc.type = "HC1")
ct.rob.01.dd <- lm.ct.rob.01.dd$ct
ct.rob.01.dd
## 
## t test of coefficients:
## 
##             Estimate Std. Error  t value  Pr(>|t|)    
## (Intercept) 649.9788     1.3229 491.3317 < 2.2e-16 ***
## smallTRUE     7.3724     1.8236   4.0428 6.288e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Hypothesis

\[\begin{align} H_0&: \quad \beta_{1} = 0 \quad \text{vs.} \\ H_1&: \quad \beta_{1} \neq 0 \quad \alpha = 5\% \end{align}\]

Decision

Use \(t-\text{value}\) (based on robust SE’s):

  • \(|t^{act}| = 4.043\)
  • \(\Rightarrow |t^{act}| > 1.96\)

Use \(p-\text{value}\) (based on robust SE’s):

  • \(p-\text{value} = 6.29\times 10^{-5}\)
  • \(p-\text{value} < 0.05\)

\(\Rightarrow\) Statistically significant \(score\) difference between small and large classes.

Note, Compare results to statistics review video.

Add your interpretation here!

Task 2


Earnings and education

The data set cps_01_data.txt contains information about earnings (in USD) and years of education of 2731 full-time workers.


Variables

Variable Description
\(ahe\) Average hourly earnings.
\(yrseduc\) Number of years of education.

For more information see the example of Stock and Watson (2020) on page 193.


Preparations

Load data, construct variables and estimation results.


Load data set

# use absolute or relative path to load data
cps.dat <- read.table(file = "CPS_01_data.txt",
                      header = TRUE,
                      sep = ",",
                      colClasses = c("numeric","numeric"))

Estimation results No 01

# estimate linear model
lm.res.02 <- lm(ahe ~ yrseduc, data = cps.dat)
# show results
lm.res.02

The estimation results below include robust standard errors which are not part of the results from the lm() function above.

\(\Rightarrow\) See: implementation below!

## 
## ============================
##            Model 1          
## ----------------------------
## intercept  -12.12 (1.36) ***
## yearseduc    2.37 (0.10) ***
## ----------------------------
## R^2          0.18           
## SER         11.24           
## ============================
## *** p < 0.001; ** p < 0.01; * p < 0.05

Task a)

Load the data set into your statistics software and provide a scatter plot of \(ahe\) (average hourly earnings) vs. \(yrseduc\) (years of education).


Scatter Plot

# plot parameters
par(mfrow=c(1, 1),
    mar=c(5, 5, 3, 2))
# scatter plot
plot(cps.dat$yrseduc, cps.dat$ahe,
     main = "Scatter Plot of Earnings and Years of Education", 
     xlab = "Years of education (X)",
     ylab = "Average hourly earnings (Y)",
     xlim = c(0, max(cps.dat$yrseduc)))

  • The values of the variables \(ahe\) and \(yrseduc\) are restricted to positive values.
  • Furthermore, the variable \(yrseduc\) is also restricted to integer values.
  • There seems to be a positive relationship between \(yrseduc\) and \(ahe\).
  • However, the variance of \(ahe\) seems to increase with increasing values of \(yrseduc\).

Add your interpretation here!

Task b)

Run a regression of \(ahe\) on \(yrseduc\). State the underlying assumptions of a causal regression. Do you think these assumptions are valid in this case?


Estimate linear regression model

# Estimate linear regression model
lm.res.02 <- lm(ahe ~ yrseduc, data = cps.dat)
summary(lm.res.02)
## 
## Call:
## lm(formula = ahe ~ yrseduc, data = cps.dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.251  -6.757  -2.087   4.249  84.709 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -12.11631    1.35518  -8.941   <2e-16 ***
## yrseduc       2.37404    0.09553  24.851   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.24 on 2729 degrees of freedom
## Multiple R-squared:  0.1845, Adjusted R-squared:  0.1842 
## F-statistic: 617.6 on 1 and 2729 DF,  p-value: < 2.2e-16

Scatter Plot with Fitted Regression Line

# plot parameters
par(mfrow=c(1, 1),
    mar=c(5, 5, 3, 2))
# scatter plot
plot(cps.dat$yrseduc, cps.dat$ahe,
     main = "Scatter Plot of Earnings and Years of Education", 
     xlab = "Years of education (X)",
     ylab = "Average hourly earnings (Y)",
     xlim = c(0, max(cps.dat$yrseduc)),
     ylim = c(-12.5, max(cps.dat$ahe)))
abline(lm.res.02, col = "red", lty = 2)

Least Squares Assumption for causal inference

  1. The conditional distribution of \(u\) given \(X=x\) has mean zero, i.e., \(\mathrm{E}\left(u|X=x\right)=0\)
    • Problem: Omitted variables, e.g., quality of education
    • \(\Rightarrow\) See next chapter!
  2. \(\left(X_{i},Y_{i}\right)\) for \(i=1,...,n\) are i.i.d.
  3. Large outliers in \(X\) and/or \(Y\) are rare.

Add your interpretation here!

Task c)

Discuss the usage of heteroskedasticity-robust standard errors.


Scatter Plot of Residuals

# plot parameters
par(mfrow=c(1, 1),
    mar=c(5, 5, 3, 2))
# Residuals plot
plot(cps.dat$yrseduc, lm.res.02$residuals,
     main = "Scatter Plot of Residuals (ahe) and Years of Education", 
     xlab = "Years of education (X)",
     ylab = "Residuals (ahe) (Y)",
     xlim = c(0, max(cps.dat$yrseduc)))

  • The residuals of a linear model fit, e.g., stored in the object lm.res can be extracted using lm.res$residuals.
  • The fitted values of a linear model fit, e.g., stored in the object lm.res can be extracted using lm.res$fitted.values.

Homoskedasticity vs. heteroskedasticity

  • The variance of \(ahe\) seems to be larger for large values of \(yrseduc\).
  • Homoskedasticity seems to be an invalid assumption.
  • \(\Rightarrow\) Use robust SE’s!

Add your interpretation here!

Task d)

Use the appropriate standard errors to to construct a \(99\%\) confidence interval for the slope coefficient.


Inference based on ordinary SE’s

# call wrapper function lm_ct_fun (here, using ordinary ses)
lm.ct.ord.02 <- lm_ct_fun(ahe ~ yrseduc, data = cps.dat, hc.type = "const")
ct.ord.02 <- lm.ct.ord.02$ct
ct.ord.02
## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -12.11631    1.35518 -8.9407 < 2.2e-16 ***
## yrseduc       2.37404    0.09553 24.8514 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Inference based on robust SE’s (w.r.t. heteroskedasticity)

# call wrapper function lm_ct_fun (here, using robust ses)
lm.ct.rob.02 <- lm_ct_fun(ahe ~ yrseduc, data = cps.dat, hc.type = "HC1")
ct.rob.02 <- lm.ct.rob.02$ct
ct.rob.02
## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -12.11631    1.35557 -8.9381 < 2.2e-16 ***
## yrseduc       2.37404    0.10244 23.1744 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Confidence interval based on ordinary SE’s

# 95% ci bounds based on homoskedasticity
ci.99.up.ord.02 <- ct.ord.02[,1] + 2.58*ct.ord.02[,2] # upper bound
ci.99.lo.ord.02 <- ct.ord.02[,1] - 2.58*ct.ord.02[,2] # lower bound
# results
cbind(ci.99.lo.ord.02,ci.99.up.ord.02)
##             ci.99.lo.ord.02 ci.99.up.ord.02
## (Intercept)      -15.612672       -8.619950
## yrseduc            2.127578        2.620511

Confidence interval based on robust SE’s (w.r.t. heteroskedasticity)

# 95% ci bounds based on heteroskedasticity
ci.99.up.rob.02 <- ct.rob.02[,1] + 2.58*ct.rob.02[,2] # upper bound
ci.99.lo.rob.02 <- ct.rob.02[,1] - 2.58*ct.rob.02[,2] # lower bound
# results
cbind(ci.99.lo.rob.02,ci.99.up.rob.02)
##             ci.99.lo.rob.02 ci.99.up.rob.02
## (Intercept)      -15.613689       -8.618933
## yrseduc            2.109743        2.638346

Use robust standard errors

\[\begin{align} CI &= \left[\widehat{\beta}_1\pm 2.58\times SE\left(\widehat{\beta}_1\right)\right] \\ &= \left[2.374\pm 2.58\times 0.102\right] \\ &= \left[2.11,2.638\right] \end{align}\]

The \(99\%\) confidence interval for the slope coefficient does not include the zero.

Add your interpretation here!

Task e)

Test if the coefficient of \(yrseduc\) is significantly different than 2 (\(\alpha=0.05\)).


Interpretation

General (for slope parameter)

\[\begin{align} H_0&: \quad \beta_{1} = \beta_{1,0} \quad \text{vs.} \\ H_1&: \quad \beta_{1} \neq \beta_{1,0} \quad \alpha = 5\% \end{align}\]


Here

\[\begin{align} H_0&: \quad \beta_{yrseduc} = 2 \quad \text{vs.} \\ H_1&: \quad \beta_{yrseduc} \neq 2 \quad \alpha = 5\% \end{align}\]

Add your results here!

\[\begin{align} t^{act} &= \frac{\widehat{\beta}_{yrseduc}^{act} - 2}{SE\left(\widehat{\beta}_{yrseduc}\right)} \\ t^{act} &= \frac{2.374 - 2}{0.102} \\ t^{act} &= 3.667 \end{align}\]

Add your results here!

  • \(|t^{act}| > 1.96\)
  • \(H_0\) can be rejected at 5% significance level.
  • Note: Rejection for \(\left(\alpha=0.01\right)\) as \(|t^{act}| > 2.58\).

Add your results here!

\[\begin{align} p-\text{value} &= 2*\Phi\left(-|t^{act}|\right) \\ &= 2*\Phi\left(-3.667\right) \\ &= 0.0024 \end{align}\]

Add your results here!


Conclusion

  • \(p\text{-value}<0.01\)
  • \(H_0\) can be rejected at 1% and thus, also at 5% significance level.

Definition \(p\)-value

  • The p-value is the probability of observing a value of \(\widehat{\beta}\) at least as different from \(\beta_{1,0}\) as the estimate actually computed \(\left(\widehat{\beta}^{act}_1\right)\), assuming that the null hypothesis is correct.
  • Type I error, i.e. rejecting \(H_0\) even if it is true.

Add your results here!

ill_hyp_tes_fun(t.act = 3.67, test = "two.sid")


Illustration

Illustration of hypothesis test


CDF Standard Normal Distribution

Source: Stock and Watson (2020), p. 763-764



Back To Start of Practical Class